102 research outputs found

    Mixed Integer Linear Programming For Exact Finite-Horizon Planning In Decentralized Pomdps

    Get PDF
    We consider the problem of finding an n-agent joint-policy for the optimal finite-horizon control of a decentralized Pomdp (Dec-Pomdp). This is a problem of very high complexity (NEXP-hard in n >= 2). In this paper, we propose a new mathematical programming approach for the problem. Our approach is based on two ideas: First, we represent each agent's policy in the sequence-form and not in the tree-form, thereby obtaining a very compact representation of the set of joint-policies. Second, using this compact representation, we solve this problem as an instance of combinatorial optimization for which we formulate a mixed integer linear program (MILP). The optimal solution of the MILP directly yields an optimal joint-policy for the Dec-Pomdp. Computational experience shows that formulating and solving the MILP requires significantly less time to solve benchmark Dec-Pomdp problems than existing algorithms. For example, the multi-agent tiger problem for horizon 4 is solved in 72 secs with the MILP whereas existing algorithms require several hours to solve it

    Computing the Equilibria of Bimatrix Games using Dominance Heuristics

    Get PDF
    We propose a formulation of a general-sum bimatrix game as a bipartite directed graph with the objective of establishing a correspondence between the set of the relevant structures of the graph (in particular elementary cycles) and the set of the Nash equilibria of the game. We show that finding the set of elementary cycles of the graph permits the computation of the set of equilibria. For games whose graphs have a sparse adjacency matrix, this serves as a good heuristic for computing the set of equilibria. The heuristic also allows the discarding of sections of the support space that do not yield any equilibrium, thus serving as a useful pre-processing step for algorithms that compute the equilibria through support enumeration

    Online Learning with Noise: A Kernel-Based Policy-Gradient Approach

    Get PDF
    International audienceVarious forms of noise are present in the brain. The role of noise in a exploration/exploitation trade-off is cast into the framework of reinforcement learning for a complex task of motor learning. A neuro-controler using a linear transformation of the input to which is added a gaussian noise is modelized as a stochastic controler that can be learned online in ''direct policy-gradient'' scheme. The reward signal is related to sensor information, thus no direct or indirect model of the system to control is needed. The task chosen (reaching with a multi-joint arm) is redundant and non-linear. The controler inputs are then projected to a feature space of higher dimension using a topographic coding based on gaussian kernels. We show that through a consistent noise level it possible to explore the environnment so as to find good control solution that can be exploited. Besides, the controler is able to adapt continuously to changes in the system dynamics. The general framework of this work will allow to study various noises and their effect, especially since it is quite compatible with more complexe types of stochastic neuro-controler, as demonstrated by other works on binary or spiking networks

    Guido and Am I Robot? A Case Study of Two Robotic Artworks Operating in Public Spaces

    Get PDF
    This article is a case study of two artworks that were commissioned for and exhibited in art venues in 2016 and 2017. The first artwork, Guido the Robot Guide, guided the visitors to an art-science exhibition, presenting the exhibits with a robot's perspective. Guido was the result of a collaboration between artists and engineers. The concept was an irreverent robot guide that could switch transparently from autonomous mode to operator control, allowing for seamless natural interaction. We examine how the project unfolded, its successes and limitations. Following on Guido, the lead artist developed the robotic installation Am I Robot? where the idea of a hybrid autonomous/remote-manual mode was implemented fully in a non-utilitarian machine that was exhibited in several art galleries. The article provides a concise contextualisation and details technical and design aspects as well as observations of visitors' interactions with the artworks. We evaluate the hybrid system's potential for creative robotics applications and identify directions for future research

    Dynamic reservoir for developmental reinforcement learning

    Get PDF
    International audienceWe present in this paper an original neural architecture based on a Dynamic Self-Organizing Map (DSOM). In a reservoir computing paradigm, this architecture is used as a function approximation in a reinforcement learning setting where the state x action space is difficult to handle. The life-long online learning property of the DSOM allows us to take a developmental approach to learning a robotic task: the perception and motor skills of the robot can grow in richness and complexity during learning. As this work is largely in progress, valid and sound results are not yet available.Dans cet article, nous présentons une architecture neuronale qui s'appuie sur une Carte Dynamique Auto-Organisée (Dynamic Self-Organizing Map DSOM). Dans un cadre de réservoir de calcul (reservoir computing), cette architecture est utilisé comme un approximateur de fonction dans un contexte d'apprentissage par renforcement où l'espace d'état-action est difficile à gérer. Les DSOM exhibant de bonne propriétés d'apprentissage continu, cette architecture nous permet de proposer une approche dévelopementale de l'apprentissage de tâches robotiques : les capacités motrices et perceptives du robot s'enrichissent au fur et à mesure que l'apprentissage progresse. Comme ce travail est en cours, des résultats valides et vérifiés ne sont pas encore disponibles

    "RĂ©servoir computing" et Apprentissage par Renforcement DĂ©veloppemental

    Get PDF
    National audienceDans cet article, nous présentons une architecture d'apprentissage par renforcement originale s'appuyant sur une carte neuronale auto-organisatrice dynamique pour intégrer une composante développementale à l'apprentissage. En suivant le schéma du "reservoir computing", la carte neuronale permet d'apprendre une approximation de la fonction de valeur dans un cadre où l'espace état × action est d'une taille conséquente. Pour appréhender la complexité inhérente à la taille du problème, nous proposons une approche développementale dans laquelle nous faisons augmenter la richesse et la complexité des espaces moteurs et perceptifs au fur et à mesure que les performances de l'agent apprenant s'accroissent. Nous détaillons les apports de cette proposition ainsi que les questions qu'elle soulève et explicitons notre architecture en la testant sur une tâche de robotique relativement simple

    Modèles stochastiques de la prise de décision collective

    Get PDF
    National audienceNous présentons le cadre mathématique des processus décisionnels de Markov partiellement observables qui permet, dans le domaine de l'intelligence artificielle, de formaliser la notion d'apprentissage par renforcement. En théorie, ces modèles formels sont aussi applicables aux systèmes collectifs composés de plusieurs agents. Néanmoins, par le biais de plusieurs travaux que nous avons effectués, nous montrons certaines limitations de ces modèles, limitations dues aux hypothèses sous-jacentes implicites ou à l'utilisation pratique de ces outils théoriques. Ainsi que nous le discutons, ces limitations peuvent être utiles pour, entre autre, identifier ou délimiter des fonctions cognitives qui seraient nécessaires pour que l'apprentissage par renforcement « artificiel » puisse s'appliquer à des problèmes que peut résoudre l'être humain. Cette question est l'une de celle que nous abordons dans le cadre de notre discussion sur les apports possible de notre domaine aux sciences cognitives

    Reinforcement Learning Approaches to Instrumental Contingency Degradation in Rats

    Get PDF
    International audienceGoal directed action involves a representation of the consequences of an action. Rats with lesions of the medial prefrontal cortex do not adapt their instrumental response in a Skinner box when food delivery becomes unrelated to lever pressing. This indicates a role for the prefrontal region in adapting to contingency changes, a form of causal learning. We attempted to model this phenomenon in a reinforcement learning framework. Behavioural sequences of normal and lesioned rats were used to feed models based on the SARSA algorithm. One model (factorized-states) focused on temporal factors, representing continuous states as vectors of decaying event traces. The second model (event sequence) emphasized sequences, representing states as n-uplets of events. The values of state-action pairs were incorporated into a softmax policy to derive predicted action probabilities and adjust model parameters. Both models revealed a number of discrepancies between predicted and actual behaviour, emphasising changes in magazine visits rather that lever presses. The models also did not reproduce the differential adaptation of normal and prefrontal lesioned rats to contingency degradation. These data suggest that temporal difference learning models fail to capture causal relationships involved in the adaptation to contingency changes

    Apprentissage par renforcement et jeux stochastiques à information incomplète

    Get PDF
    Le but de notre travail est de permettre à des agents d'apprendre à coopérer. Chaque agent étant autonome et, forcément, différent des autres, c'est une tâche particulièrement difficile, surtout si les but des deux agents ne sont pas exactement les mêmes. Notre souci est de travailler avec des agents les plus simples possibles, c'est-à-dire plutôt réactifs. Nous proposons alors de doter les agents de capacités limitées de communication pour mettre en place une notion similaire aux “contrats” de la théorie des jeux. Si les agents s'accordent sur cette notion de contrat, notre algorithme leur permet de converger vers des équilibres qui induisent des comportements “plus coopératifs” que le simple équilibre de Nash

    Efficient Learning in Games

    Get PDF
    National audienceWe consider the problem of learning strategy selection in games. The theoretical solution to this problem is a distribution over strategies that responds to a Nash equilibrium of the game. When the payoff function of the game is not known to the participants, such a distribution must be approximated directly through repeated play. Full knowledge of the payoff function, on the other hand, restricts agents to be strictly rational. In this classical approach, agents are bound to a Nash equilibrium, even when a globally better solution is obtainable. In this paper, we present an algorithm that allows agents to capitalize on their very lack of information about the payoff structure. The principle we propose is that agents resort to the manipulation of their own payoffs, during the course of learning, to find a ``game'' that gives them a higher payoff than when no manipulation occurs. In essence, the payoffs are considered an extension of the strategy set. At all times, agents remain rational vis-Ă -vis the information available. In self-play, the algorithm affords a globally efficient payoff (if it exists)
    • …
    corecore